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We suggest an adaptive sampling rule for obtaining information 
from noisy signals using wavelet methods. The technique involves 
increasing the sampling rate when relatively high-frequency terms 
are incorporated into the wavelet estimator, and decreasing it when, 
again using thresholded terms as an empirical guide, signal complex¬ 
ity is judged to have decreased. Through sampling in this way the al¬ 
gorithm is able to accurately recover relatively complex signals with¬ 
out increasing the long-run average expense of sampling. It achieves 
this level of performance by exploiting the opportunities for near-real 
time sampling that are available if one uses a relatively high primary 
resolution level when constructing the basic wavelet estimator. In the 
practical problems that motivate the work, where signal to noise ra¬ 
tio is particularly high and the long-run average sampling rate may 
be several hundred thousand operations per second, high primary 
resolution levels are quite feasible. 


1. Introduction. In this paper we suggest methods for online signal re¬ 
covery, when a noisy signal is sampled at discrete times and either the raw 
data, or an estimator of the signal computed from the raw data, is recorded 
or transmitted after a relatively short time delay. There may be no oppor¬ 
tunity to go back and re-sample the signal if it transpires that parts of the 
signal are so complex that insufficient information was acquired in the first 
sampling operation. However, there is a possibility of increasing the sam¬ 
pling rate online, if, at the current sampling time, it appears that the rate 
is insufficient to capture important features of the signal. Nevertheless, the 
long-run average cost of sampling, per unit time, should not exceed a given 
bound, imposed (e.g.) by the capacity of the storage device. 
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How should we shift from one sampling rate to another, and back again? 
What sorts of gains in performance can we expect to achieve using this tech¬ 
nology? In the setting of wavelet estimators, we suggest an answer to the 
first of these questions; and, for our particular rate-switching algorithm, we 
answer the second question. Our results help to underpin recent accounts of 
this type of methodology, discussed by, for example, Chen, Itoh and Shiki 
(1998), Aldroubi and Grochenig (2001) and Anon (2002). Our main argu¬ 
ments and results may be summarized in elementary language, even though 
their detailed description requires somewhat intricate theory. We give the 
summary below. 

A wavelet estimator is, of course, particularly good at recovering complex 
signals from noisy data. Nevertheless, if the sampling rate is only p, then 
a wavelet estimator is unable to adequately approximate a signal whose 
frequency approaches p, particularly if that frequency is only exhibited over 
a short time interval. Moreover, if the sampling rate is only p , then we may 
not even be aware that signals with frequency greater than p are present. 
However, a sudden increase in frequency, at a level somewhat below the 
“base” sampling rate, say p = pi, might be interpreted as suggesting that 
higher frequencies are present. Hence, an increase in resolvable frequencies 
might reasonably be used to trigger an increase in the sampling rate to p 2 , 
say. 

In a wavelet estimator, high-frequency terms are incorporated after an 
empirical assessment, based on a threshold, of whether or not the coefficients 
of those terms are significantly different from zero. We use the occurrence 
of one or more relatively large values among the coefficients to indicate the 
presence of high-frequency oscillations, and to trigger an increase in sampling 
rate, from p\ to p 2 • Likewise, the absence of large coefficients among these 
terms is used to trigger a return to rate p\. 

This algorithm has a number of variants, including (e.g.) using majority- 
type rules, applied to sets of resolution levels, to define triggers for increasing 
or decreasing the sampling rate, and using more than two different sampling 
rates. An explicit restriction on the amount of time during which we sample 
at the higher rate can be imposed to ensure a relatively early return to 
rate p\ when there is a danger of exceeding data storage capacity. 

Of course, if the long-run sampling cost is kept fixed, then increasing the 
sampling rate in some parts of the signal inevitably involves reducing it in 
others, relative to the rate that would be employed if sampling were uniform. 
This necessarily reduces the fidelity of the signal estimate there, measured 
mathematically in terms of L p distance, for example. However, in the con¬ 
text of machine-recorded data that motivates our work, error variances are 
usually small, and so the L p error penalties incurred by slightly reducing 
the sampling rate in places where the signal is relatively “uninteresting” are 
not likely to be high. 
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The potential gain is that relatively high frequencies that would normally 
be overlooked can now be recovered. Provided the time periods where these 
frequencies occur are relatively short in duration, and assuming our algo¬ 
rithm adapts sufficiently quickly, using a higher sampling rate there will 
require only a modest reduction in the rate at the more common places 
where the signal is “quiet.” 

There is, of course, a vast and rapidly growing literature on statistical 
properties of wavelet methods. We mention here only the papers of Donoho 
and Johnstone (1994, 1995) and Donoho, Johnstone, Kerkyacharian and Picard 
(1995); further literature will be discussed in Section 2. Methods for op¬ 
timal design, in the setting of wavelet methods, have been suggested by 
Herzberg and Traves (1994) and Oyet and Wiens (2000), although not in 
the context treated in the present paper. 

2. Methodology. 

2.1. Model for data generation. In practice, while digitally recorded data 
might be the result of sampling at unequally spaced times, they would, 
nevertheless, involve sampling at times on a grid. (Not all grid points need 
have data associated with them, however.) Reproducing an approximation 
to the true signal may involve a mixture of imputation, to estimate the signal 
at grid points where it was not sampled, and interpolation or smoothing, to 
reduce the impact of noise. If the edge length of the grid is £, then the grid 
points are A;£, for —oo < k < oo. 

Usually, £ would equal the minimum possible spacing between adjacent 
sampling times T t . which are indexed in increasing order, are distinct, are 
integer multiples of £, and (conceptually) increase from the infinite past to 
the infinite future. At time Tj we record datum Y), given by 

(2.1) Yi = g(Ti)+£i, —oo <i < oo, 

representing the value of the true signal g at time T t . degraded by additive 
noise The efs are assumed to have zero mean and variance a 2 , and 
the threshold for the wavelet estimator will be constructed so that it is 
proportional to u and inversely proportional to the square root of sampling 
rate. In this way it will reflect signal-to-noise ratio. 

The value of T t is determined by previous data, and so is measurable in 
the sigma-field J~ t -\ generated by the set of pairs (T), Y )) for j < i — 1. It 
will be assumed that the distribution of £*, conditional on J- t - \, has zero 
mean and finite variance and does not depend on i. 

2.2. Data imputation. At any grid point t = k£, not necessarily one of 
the Tj’s, we wish to estimate g(t) using the data at the possibly unequally 
spaced times T % . There is a range of ways of achieving this goal, adapted (for 
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example) from methods suggested for dealing with nonregularly spaced de¬ 
sign in more conventional problems where wavelet estimators are employed. 

See, for example, Hall and Turlach (1997), where interpolation is suggested; 

Cai and Brown (1998) and Hall, Park and Turlach (1998), where transfor¬ 
mation and binning are employed; Cai and Brown (1999), who used a univer¬ 
sal thresholding technique; Sardy, Percival, Bruce, Gao and Stuetzle (1999), 
who considered a variety of different methods; Zhang and Zheng (1999), who 
addressed theoretical issues associated with nonregular design; Kovac and Silverman 
(2000), who discussed coefficient-dependent thresholding; Antoniadis and Fan 
(2001), who developed a penalization approach; Delouille, Franke and von Sachs 
(2001), Delouille, Simoens and von Sachs (2001) and Delouille and von Sachs 
(2002), who introduced “design-adapted” wavelet methods for a variety of 
applications; and Pensky and Vidakovic (2001), who described theory for 
projection-based techniques. 

While these methods have excellent numerical and theoretical properties, 
and can be expected to produce very good results in relatively familiar 
settings, not all are suitable for online applications. This is particularly true 
of relatively computer-intensive techniques, and of those that require that an 
overview be taken of the full design distribution before determining how the 
final estimator will be constructed. We shall borrow from Hall and Turlach 
(1997) and Hall, Park and Turlach (1998), and at each grid point k £ impute 
a datum Z^, taking it to equal Yi, where X) is chosen to be as close as 
possible to t subject to not exceeding k£. Define Z t (s) to equal Z^ for k £ < 
s < min{(£: + 1 )£,f} and to equal 0 otherwise. The superscript t indicates 
that the function Z l is based only on data that are sampled up to time t. 

To fully appreciate the role played by imputation, it is important to realize 
that the wavelet estimator will most likely appear only in the very last step 
of the chain: “record data-store/transmit data-recover signal.” It is at this 
final stage that imputation will occur, well after any decision has been taken 
about what to store or transmit. Therefore, although it might appear as 
though some sort of “internal sampling” at the higher sampling rate might 
avoid the need to interpolate, that will seldom be possible. 

2.3. Wavelet estimator. Let <p and ip denote the “father” and “mother” 
wavelet functions, respectively, and let r > 1 be the least integer such that 
/ u r ip(u) du ^ 0. That is, ip is of order r. Write p for the primary resolution 
level, put pi = 2 l p for * > 0, and define <pj(t) = p 1 ^ 2 cp(pt — j ) and ipij(t) = 

p] / 2 iiPit-j). 

The wavelet coefficients for g are bj = f gcpj and bij = f gipij, and the 
corresponding expansion of g is 

OO 

S = H b i c h +Y.Y. ^ ■ 

j i =0 j 


( 2 . 2 ) 
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The estimators of bj and bij, based on data observed up to time t, are 
bj = f Z t (f> J and b\j = / Z t ifij , respectively. The hard-thresholded form of 
our wavelet estimator of g is 

<?-! 

( 2 -3) 9 t = J2 % J (I% I ^ > 

3 *=0 j 

where 5 > 0 denotes the threshold, and q > 1 needs to be chosen. A soft- 
thresholded estimator may be constructed similarly. 

We shall take 

(2.4) 5 = Ca(p~ 1 log p) 1/2 , 

where C > 0 is a constant. This choice reflects the fact that the variance of m- 

L J 

is, in the case of truly independent errors and to a good first approximation, 
proportional to er 2 / p. Owing to the sequential nature of sampling, the errors 
are not actually independent, but b\j is, nevertheless, a martingale, and using 
that result the same variance approximation can be derived. 

In practice, C and a 2 usually would be chosen through prior experience 
with both signals and equipment. In the type of application we envisage, 
there would be no opportunity for a technician to adjust algorithms in situ ; 
the equipment would be expected to function as a “black box.” Therefore, 
the only options are fixed, prior choice of parameters, or automatic, locally 
adaptive choice. Arguments based on the needs for robustness, real-time 
analysis and computational economy, and the fact that traditional measures 
of performance do not apply in this problem, relegate in favor of the latter 
approach. 

It is conventional to take C = 2 in the threshold, although C > 2 1 / 2 is 
adequate for our purposes, as we shall show in Section 4. In the case of 
heavy-tailed data one could use a larger moderate-deviation compensator 
than the factor (logp) 1 / 2 that we employ in (2.4). The compensator log/? is 
sometimes suggested as an alternative. 

2.4. Time delays in near-real time inference. A feature that distinguishes 
the context of the present paper from more conventional curve-estimation 
problems is its “online, real-time” nature. There are two aspects to this, aris¬ 
ing when recording and “playing back” the data, respectively. When record¬ 
ing the data we wish to detect sharp increases or decreases in frequency 
relatively quickly, and to change the sampling rate accordingly. Here, the 
time-delay should ideally be very small. 

The recorded data might, for example, represent acoustic information 
stored on a CD track that we have played up to time t. We wish to produce 
an approximation to the signal at t. When playing the data back in this way 
it will usually not be a problem if we interpret “at t” a little liberally. For 
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instance, it is not essential that the sound we hear at t represent the signal 
which, at that very time, is being uptaken from the CD by the laser reader. 
We are prepared to accept a short time delay, the length of which is not as 
crucial as in the recording phase. In other settings, for example, where the 
recorded data represent remotely sensed information that will be subjected 
to detailed analysis in a laboratory, time delay at playback will be even less 
of an issue. 

As we shall show in a moment, it is convenient to take time delays at 
both recording and playback stages to be inversely proportional to the pri¬ 
mary resolution level, p, which is an increasing function of the long-term 
average sampling rate. Hence, the high sampling rates of contemporary dig¬ 
ital equipment lead to low time-delays. Moreover, the low noise that often 
characterizes machine-recorded data, and which permits even larger primary 
resolution levels, further reduces time-delay. 

We shall assume <j) and ^ are continuous on the real line and have compact 
support, contained within the finite interval [a, 6], where a < 0 < b. It follows 
that, for a given index i, an integral of the form / ai/jij, for a function a, 
involves a(t ) only if t £ [(a + j)/pi, (b + j)/pi\- If the integral depends on 
a(t), then it does not depend on a(s) for s > t + p~ l {b — a), and, hence, not 
for s > t + p -1 (6 — a). A similar argument applies to integrals of the form 
/ « <j>i- 

Therefore, if it is acceptable to have a delay of t = [b — a)/p time units, 
between when a signal at t is sampled and when its value at t is estimated, 
then t can be taken to be an “interior” point of the estimator. In this case 
estimating g(t ) by g t+T (t) is appropriate. The latter is identical to g s (t) for 
any s > t + r and, in particular, is identical to the familiar wavelet estimator 
that would be used if the full dataset, in infinite time, were employed. It is, 
therefore, no longer necessary to employ the superscript on <?*, b and £>|-, 
and we shall usually not use it in the sequel. 

The time taken to respond to a change in signal frequency, by increasing 
or decreasing the sampling rate, will usually equal an integer multiple of £ 
that is not less than r. To appreciate how small r might be in practice, note 
that the optimal choice of p, for a high sampling rate, is large. Indeed, the ap¬ 
propriate value of p~ l is approximately equal to (na 2 /'•y 2 ) 1 /( 2r+1 )p _1 /( 2r + 1 ) j 
where cr 2 denotes noise variance, y 2 is the average of the squared rth deriva¬ 
tive of the signal, k is a constant depending only on the wavelet type, and p 
(expressed as a frequency in Hz, denoting the number of samples per second) 
is the sampling rate. See Section 4 for details. Usually, is only a 

little greater than 1, a 2 is small, y 2 is moderately large, and p _1 /( 2r+1 ) is 
small; see below for discussion. As a result, r can be kept to a small fraction 
of a second. 

Sampling rates (and, hence, values of p) for familiar digital consumer 
devices are generally quite high. They vary from 8 kHz (for digital tele¬ 
phony), through 32 kHz (digital radio), 44.1 kHz (for conventional CDs) and 
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96 kHz or 192 kHz for DVD audio, to several mHz for new multi-channel 
systems. Taking these values to the power 1/5, so as to model a second-order 
smoother, gives a small value for p~ 1//5 . For example, it is 0.1 in the case of 
a 100 kHz sampler. 

2.5. Rate-switching rule. Our rule operates on the principle that a sig¬ 
nal can be deemed to be relatively erratic, and the sampling rate increased, 
when the thresholded terms in the double series at (2.3) start to make sig¬ 
nificant contributions. In theory, the sampling rate can be varied virtually 
in the continuum. However, we shall treat only a two-rate regime, where 
the estimator is constructed using rate p\ on the majority of occasions, but 
the rate is increased to p 2 on relatively rare occasions when high-frequency 
thresholded terms start to be included in the estimator. Likewise, a reduc¬ 
tion in the sampling rate, from p 2 back to p \, is triggered when the threshold 
inequality \bij \ > 5 starts to fail to be satisfied. 

Therefore, the determination of sampling rate, which is done only at the 
recording step, uses just the wavelet coefficients bij, not the full estimator g. 
The latter is employed only at the playback step. Nevertheless, our analysis 
will treat the two steps together, because the strategy employed during 
recording must be justifiable by good performance during playback. 

2.6. A specific variable-sampling rate estimator. If, at the current time, 

we are sampling the signal at time points that are integer multiples of ££, 
we shall say we are sampling at rate p\ = ■ If we are sampling at all 

integer multiples of £, we shall say we are sampling at rate p 2 = 

Let p denote a primary resolution level (appearing in the definition of 
estimators in Section 2.3) that is appropriate when the sampling rate is 
constant at p\ over a long period. Choice of p will be discussed in Section 4. 
We shall use this p throughout, even when the sampling rate is p 2 , and rely 
on thresholded terms to produce improved performance when the signal is 
relatively erratic. However, the values of q at (2.3), and 5 at (2.4), will be 
rate-dependent. Each will be given the subscript j when the sampling rate 
is Pj . 

Let To denote the least integer multiple of £fi that is not less than (b — a)/p , 
where b — a is an upper bound to the widths of the supports of 4> and i/j. Then 
the time-delay r, introduced in Section 2.4, does not exceed To- Put q = q 2 if 
the sampling rate has been p 2 for at least the last To time units, and q = q\ 
otherwise. Likewise, recalling (2.4) and defining 5j = Ca(pJ L logpj) 1 / 2 , we 
employ the threshold 82 when the sampling rate has been p 2 for at least 
the last To time units, and we use the threshold <5i otherwise. In practice, 
the value of a would be replaced by a value determined after extensive 
experimentation with real data. Section 4 will discuss choice of q \, q 2 and 


P. HALL AND S. PENEV 


the constant C. We could use a smaller time delay than tq when sampling 
at the higher rate p2, but choose not to so as to simplify discussion. 

The actual estimator used is given at (2.3). There we take t = s + to and 
evaluate the estimator at s. It follows that the coefficient estimators bj and 
bij have the same form they would if the full dataset, in infinite time, were 
employed. Using this interpretation of bj and bjj , we define 

<?— 1 

g(s) = Y^bj^j( s ) + 

j i =0 j 

where the rule given in the previous paragraph is used to determine q and 5. 

Next we define the mechanism for changing the sampling rate. If we are 
currently sampling at rate p\ , then, at time t = k££, we increase the rate to 
P 2 if and only if at least one of the values of b^ |, for j such that 7 ^ 0 

and for i such that pi exceeds a predetermined lower bound ni , exceeds the 
threshold <5i. If we are currently sampling at rate p 2 , and have been for at 
least To time units, we continue at this rate until the next time t = k£ at 
which none of the values of b^ |, for vr 2 < Pi < p q2 and j such that / 0 , 

exceeds 82 • Here 7 r 2 is another predetermined lower bound. 

Our regularity conditions on q\ and 52 [see (4.3)] do not require qi < q 2 - 
However, taking q\ < q 2 does reflect the fact that a higher sampling rate 
allows a greater number of wavelet coefficients to be reliably estimated. 
Similarly, our assumptions do not demand that tt\ <tt2, but this restriction 
is not unnatural, for the following reason: tt\ can be viewed as the highest 
frequency which the low-sampling-rate estimator is capable of adequately 
resolving, and 112 as the lowest frequency for which sampling at the higher 
rate is necessary in order to produce an adequate estimate. 

Of course, 8\> 82, but this does not contradict the fact that exceedences 
of the thresholds <5i and 82 are used as parts of rules for increasing and 
decreasing the sampling rate, respectively. The relatively large size of 5i 
reflects only the fact that sampling at the lower rate produces relatively noisy 
estimates of wavelet coefficients, which require a relatively high threshold in 
order to guard against incorrect decisions caused by stochastic variation. It 
is the values of tt\ and 7 r 2 , not those of and 82 , which are instrumental in 
determining whether high- or low-frequency features are present. 

Therefore, the rule for switching from rate p\ to P2 is to increase the rate 
if and only if 

| b{j | > 8 for some pair (z, j) with ipij{t) 7 ^ 0 (at current time 
and 7 Ti <pi < p qi , where p = o(iri) and 7 Ti = o(p qi )\ 
and the rule for switching back again is 

| b l; j < 8 for each pair (i.j) for which ^ij{s) 7 ^ 0 for 
(R.2) some s £ [t — To,t] (where t denotes current time) 

and ir2<Pi< p q2 [where p\ = o( 7 r 2 ) and 7 r 2 < p q2 \. 
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Fig. 1. Flow chart summarizing rate-switching algorithm. 


[Both (R.l) and (R.2) include regularity conditions which will be used in 
Section 4.] An overview of the algorithm, after these rate-switching rules are 
incorporated, is given in the flow chart in Figure 1. 

Constraints on the amount of time spent sampling at the higher rate can 
be introduced to prevent the storage device from filling too rapidly. The 
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algorithm depends on a number of “tuning” parameters, in particular, p, q\ 
and f/ 2 , 7Ti and H 2 , the constant C in the threshold formula (2.4), and, of 
course, the sampling rates themselves. In practice these quantities would be 
chosen from practical experience with the signal type. 

2.7. Using local Fourier methods with windows of fixed length. A re¬ 
viewer has suggested that our wavelet approach might not be competitive 
relative to a classical local Fourier method using a window of fixed width. 
However, if, for example, the signal were to have a discontinuity within the 
interior of the window, or if the values of the signal at either end of the win¬ 
dow were unequal, then the Fourier approach—which would perform poorly 
for functions with discontinuities, interpreted in a periodic sense—would 
not give good results. In principle this problem could be overcome by choos¬ 
ing the interval adaptively so that discontinuities were situated at its ends. 
However, that would require continuous local testing for change-points and 
would arguably be difficult to implement in an on-line fashion. Moreover, 
such an approach would not address cases where function values at the ends 
of the interval were different, or where other sorts of signal irregularities, 
readily adapted to by wavelets, were present. 

This issue, of the noncompetitiveness of fixed-bandwidth, local-Fourier 
methods relative to wavelet ones, in the context of signals with disconti¬ 
nuities and other types of irregularity, is unrelated to our rate-switching 
scheme. It arises equally in conventional function estimation problems. 

3. Numerical properties. 

3.1. Smooth signals with aberrations. Here we illustrate performance in 
the case of a smooth sinusoid with four different aberration sequences, de¬ 
picted in Figures 2-5, respectively. Figures 2 and 3 deal with aberrations of 
increasing amplitude and fixed frequency, and increasing frequency and fixed 
amplitude, respectively, added at extrema (i.e., at peaks and troughs) of the 
sinusoid. Figures 4 and 5 address the same respective types of aberration, 
but added at relatively “linear” places between peaks and troughs. Formulae 
for the true functions, g = gi, ... , 54 , used to produce the respective figures, 
are given at (3.1)-(3.4). 

Figures 3-4 have three panels, showing, respectively, the true signal, its 
wavelet estimate based on dual-rate sampling and its estimate in the case 
of fixed-rate sampling using the average of the sampling rates employed in 
dual-rate sampling. In particular, for a given realization we calculated the 
number of sampling operations used by the dual-rate algorithm to produce 
the estimate in the second-to-last panel; and we then sampled at a constant 
rate, using this number of sampling operations, and employed the data so 
obtained to produce the estimate in the last panel. Similar results were 
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obtained if, in the constant-rate case, we sampled at the rate obtained by 
averaging over all B = 500 Monte Carlo simulations in the dual-rate case. 

The last three panels of Figure 2 show, respectively, the results described 
above. The first panel of Figure 2 depicts the noisy dataset from which 
the estimates in the third and fourth panels were computed. We have not 
shown the noisy data for the other three signals, since doing so adds little 
of interest. 

The superimposed dashed line, in the second-to-last panel of each figure, 
indicates sampling rate as a function of time. Where the line is at level 0 
or 0.5 the sampling rate was low or high, respectively. It can be seen that 
the rate actually switches up and down several times during high-frequency 
oscillations. (Using a slightly modified rate-switching rule virtually elim¬ 
inates these fluctuations and improves performance, but we do not show 
those results here.) 

For each panel of each figure the results shown are those for the realization 
that gave, among all B = 500 realizations of data corresponding to that 




Fig. 2 . Analysis of noisy observations of gi. The first panel shows the noisy data at a 
sampling rate of 100 Hz, the second panel shows the true signal [i.e., a graph of y = gi(t), 
where g\ is given by ( 3 . 1 )], and the third and fourth panels show the estimates of gi 
obtained by dual- and constant-rate sampling, respectively. Here and in subsequent figures, 
the superimposed dashed line indicates whether the algorithm was operating at the “high” 
or the “low” sampling rate. 
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Fig. 3 . Analysis of noisy observations of g 2 - The first panel shows the true signal [i.e., a 
graph of y = g 2 (t), where <72 is given by ( 3 . 2 )], and the second and third panels show the 
estimates of <72 obtained by dual- and constant-rate sampling, respectively. 


figure, the median value of integrated squared error. (For two of the signals 
we actually conducted B = 1000 simulations to check whether the results 
were significantly different, but they were, in fact, virtually identical. The 
results reported here are all for the B = 500 case.) 

Following standard practice we illustrate results in the cases p = 1 and 
(7 = 2, although more favorable results were obtained for different values. 
In the algorithm discussed in Section 2 we took q\ = 4, q 2 = 5, tt\ = 2 and 
7T 2 = 3. 

The function ^ was chosen from the Daubechies family of compactly 
supported wavelets with extremal phase and r = 5 (i.e., with the length of 
its support equal to 2r — 1 = 9). 

Signals were sampled at discrete points in the interval [0,100]. To each 
sampled value, Normal N(0,0.15 2 ) noise was added. The edge length of the 
basic sampling grid (i.e., the minimum permitted spacing between adjacent 
sampling times) was chosen to be £ = 0.01, which would correspond to a 
sampling rate of 100 data per unit time if sampling were performed at each 
grid point. This rate, which we shall refer to as “100 Hz,” is the rate used 
to construct the picture of noisy data in the first panel of Figure 2. 
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Fig. 4. Analysis of noisy observations of gs. Panels are in the same order as in Figure 3 . 


For our dual-rate algorithm, in the low-rate mode we sampled every sixth 
observation. That is, in the notation of Section 2.6, we took f = 6. Equiva¬ 
lently, pi = 100/6 ~ 17 Hz. In the high-rate mode we sampled at each grid 
point, so that p 2 = 100 Hz. Sampling at the high rate continued until the 
end of a minimal time period, of length (2 r — l)100(6p) -1 + 1 ~ 150 units, 
in which no wavelet coefficient exceeded the threshold. 

For each signal type, observations from the first 5% of the time interval 
[0,100], sampling at the higher rate, were used to estimate error variance 
and, thereby, compute thresholds. In particular, we did not assume error 
variance to be known, although in practice it would most likely be fixed in 
advance. 

Given an interval [ a,b ], let I\ a u(t) = 1 or 0 according as t G [a, b] or t ^ 
[a, b\ . In this notation, formulae for the functions shown in the first panels 
of Figures 3-5, and second panel of Figure 2, are, respectively, 

gi(t) = 2.4sin(0.067rt) + 0.2525sin{87r(f — 24)}-f[ 2 2 , 26 ] (i) 

+ 0.5050 sin{87r(f — 42)}/[ 40 44 ] (t) 

+ 0.7575sin{8vr (t - 58)}/ [56i60] (i) 

+ 1.1 sin{8vr(t - 75)}/ [73i77] (t), 


(3.1) 
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Fig. 5 . Analysis of noisy observations of g 4 . Panels are in the same order as in Figure 3 . 


<72 (i) = 2.4sin(0.067ri) + 0.35sin{27r(t — 24)}/[ 22>26 ] (i) 

+ 0.35sin{47r(i — 42)}/[ 40 j 4 4 ] (t) 

+ 0.35 sin{67r(i - 58)}/[ 56 , 60 ] W 
+ 0.35sin{87r(i — 75)}Z[ 7377 ] (i), 

< 73 (i) = 2 . 4 sin( 0 . 067 rt) + 0.2525 sin{ 8 - 7 r(f — 32)}Z[ 30 34 ] (t) 
+ 0.5050 sin{ 87 r(t — 51)}^ 4 g 53 ] (t) 

+ 0.7575 sin{87r(i — 67)}Z[ 65 69 ] 

+ l.lsm{8vr(i - 84)}J [82i86] (i), 


g 4 .it) = 2.4sin(0.067rf) + 0.35<r{27r(t — 32)}/[ 30 i 3 4 ] (t) 

+ 0.35sin{47r(f - 51)}/ [49i53] (t) 

+ 0.35sin{67r(f - 67)}/[ 65 ,69] (*) 

+ 0.35sin{87r(f - 84)}/ [82j86 ] (t). 

These represent a basic sinusoid, with frequency 0.06 and formula g(t) = 
2.4 x sin(0.067rt), to which are added, in the cases of functions < 71 ,...,< 74 , 
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respectively, (i) four aberrations, each with frequency 8n and with increasing 
amplitudes, and each of duration four time units; (ii) four aberrations, each 
with amplitude 0.35 and with increasing frequencies [culminating in the 
frequency arising in case (i)], and each of duration four time units; and (iii) 
and (iv) the respective versions of (i) and (ii) where the four aberrations 
are added midway between extrema of the sinusoid. (The aberrations are of 
the same duration in each case, although it may appear that durations are 
longer in the cases of signals 33 and <74.) 

The following qualitative properties of dual-rate sampling are illustrated 
by Figures 2-5. Performance advantages are generally most clear in partic¬ 
ularly difficult cases, where the aberrations are of relatively high frequency 
and low amplitude and so are difficult to distinguish from noise. (The first 
of the four aberrations added to the signal in Figure 2 is of just this type.) 
Even though the advantages of dual-rate sampling become more evident as 
frequency increases, it can, nevertheless, perform well even for relatively low- 
frequency aberrations (see Figures 3 and 5). Its potential is most marked 
when an aberration is added to a part of the signal which is changing rela¬ 
tively fast, such as to an extremum of the sine waves (see Figures 2 and 3). 
However, in difficult cases, where the aberration is of low amplitude and 
high frequency, it has much to offer in other cases too (see Figures 4 and 5). 

The mean integrated squared errors (MISEs) of the four signals, approx¬ 
imated by averaging integrated squared errors over the B = 500 simulations 
conducted for each of the four signals, were (i) 0.348, (ii) 0.320, (iii) 0.341, 
(iv) 0.321 in the respective cases of < 71 ,..., 34 , for dual-rate sampling; and, 
respectively, (i) 0.963, (ii) 0.342, (iii) 0.965, (iv) 0.348 for constant-rate sam¬ 
pling with the same average sampling rate. Noting that the MISE advantages 
of dual-rate sampling are substantially greater in cases (i) and (iii) than in 
cases (ii) and (iv), one reaches the expected conclusion that dual-rate sam¬ 
pling primarily overcomes problems due to aberrations of a high-frequency, 
rather than low-amplitude, nature. 

Indeed, on the basis of these results one might argue that, in MISE terms, 
the advantages of dual-rate sampling are marginal in the cases of signals 32 
and 34. At first sight this seems at variance with a visual inspection of the 
figures. However, calculating mean integrated squared errors over only the 
four intervals, each of length four time units, for each signal, one obtains 
instead the values (i) 0.202, (ii) 0.180, (iii) 0.185, (iv) 0.162 in the dual¬ 
rate case, and (i) 0.869, (ii) 0.245, (iii) 0.763, (iv) 0.209 in the constant-rate 
setting. Therefore, in the cases of 32 and 34, dual-rate sampling does confer 
an advantage in terms of its ability to resolve the aberrations, although not 
as much of an advantage as in the cases of 31 and 33. 

3.2. Discontinuous signals with aberrations. In order to show that our 
algorithm is not adversely affected by jump discontinuities in signals, we 
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Fig. 6. Analysis of noisy observations of g$ and ge. The top two panels show the 
true functions. Immediately below them are the respective estimates, obtained using the 
rate-switching rule. 


applied it to signals which had high-frequency aberrations just before, or 
just after, or shortly before or after, jumps. 

Specifically, the function <75 has jumps at points of increase or decrease in 
the function, and one jump (the third) at a point which is both a point of 
increase and a point of decrease. On the other hand, the “block” function g§ 
has zero derivative except at points that are either part of high-frequency 
aberrations, or are located at jumps. Apart from the fact that we use dif¬ 
ferent functions in the present section, all settings (and, in particular, all 
tuning parameters) are the same as in Section 3.1. 

Results are summarized in Figure 6 , which shows (for each of the two 
signals) the realization that gave the median value of integrated squared 
error out of the 500 simulations conducted. In the case of the signal < 75 , it 
can be seen that the isolated discontinuity near t = 20 causes no problems 
for the rate-switching rule, and that the method takes in its stride even the 
very large discontinuity near t = 40, which has high-frequency aberrations 
on either side. The Gibbs phenomena which are present, and which are 
related to the jump discontinuities, also appeared when our experiments were 
conducted in the absence of the high-frequency aberrations (but with the 
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jumps still present). These phenomena are a feature of the wavelet method 
rather than of the rate-switching rule. 

The method also gives good results for the function g^, although better 
performance for such a signal (with or without the high-frequency episodes) 
usually is obtained using a Daubechies wavelet with a smaller value of r. 
(We employed the same parameter values throughout our study, since we 
did not wish parameter values to be confounded with function type when 
interpreting the results.) The isolated discontinuities at t = 20 and t = 80 are 
dealt with very well, and the method also enjoys good performance around 
the points t = 60 and t = 63. The Gibbs effect to the right of t = 43 and 
t = 90 are caused by the large discontinuities there. 

Isolated jump discontinuities, such as those in the functions g$ and g$, can 
sometimes trigger an increase in sampling rate; the jumps may be “misinter¬ 
preted” by the algorithm as very high-frequency phenomena. However, this 
does not cause difficulty. When a sampling-rate increase occurs at a jump 
discontinuity, it elicits extra information about the location and size of the 
jump, and that does no harm. Moreover, the rate quickly switches down 
again after the jump, so little cost is incurred through additional sampling. 

For example, in the case of the function g$, jumps at the points 20, 60 
and 80 were sufficiently small not to trigger any rate increase. A rate change 
did generally occur in connection with the larger jump at 40, but without 
detrimental effects on the estimator. Indeed, the algorithm deduced that 
the jump was followed by high-frequency events, and correctly maintained 
sampling at the higher level until the high-frequency sinusoids were past. 
If the high-frequency events immediately to the right of 40 were removed, 
then the algorithm returned quickly to the lower sampling rate immediately 
after the jump. 

The functions g§ and g& are given by the following: 

g 5 (t) = log(l + 0.1 t)/ [0i20] (t) + [exp{0.1(f - 20)} - l]/( 20 , 40 ] (*) 

+ log{l + 0.1(t — 40)}/( 40) 6 o] (t) + exp{—0.2(t — 60)}T( 60 80 ] (t) 

+ 0.51og{l + 0.05(t — 80)}/( 80j ioo)(t) + sin{9(i — 16)}/( 15 j 1 7] (t) 

+ sin{8(f — 38)}-1(37, 39 ] (0 + sin{7.3(t — 41)}/( 40 j4 2] (t) 

+ sin{9(f — 61 )}/( 6 o, 62 ] (t) + sin{9(f — 81)}I( 80! 82] (i) 5 

56 (t) = I( 20,40] (t) + 4/(40,60] (0 + 6/(60,80] (t) + 5/(80,90] W 

+ sin{9(f — 16)}/( 15j17 ] (t) + sin{9(f — 42)}/( 41 43 ] (t) 

+ sin{9(f — 61)}/( 60 , 63 ] (t) + sin{9(f — 89)}/( 88! g 0 ] (t). 
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4. Theoretical properties. The main aim of this section is to establish, 
under explicit conditions, the four properties below. Together they describe 
the manner in which the rate-switching algorithm responds to different signal 
frequencies, and the way in which it can increase the estimator’s overall 
performance. Proofs are given in a longer version of the paper available on 
the web [Hall and Penev (2002)]. 

Property I (Sampling rate remains at p\ during “quiet” periods). Sup¬ 
pose we start at the left-hand end of a finite interval X using sampling rate 
pi, and that the signal is relatively quiet in X. Then the probability that 
rate p\ persists right across X converges to 1 as p\ —> oo. Moreover, the rate 
will quickly switch from p 2 to p\ if we start a quiet interval at the higher 
rate. See Theorems 4.1 and 4.6 for details. 

Property II (Sampling rate increases to p 2 when signal complexity in¬ 
creases). Suppose the variable-rate estimator is operating at rate pi dur¬ 
ing a quiet period, and then enters a period of relative activity. Then the 
algorithm will, with high probability, trigger a switch from rate p\ to p 2 
during J. See Theorem 4.4. 

Property III (Sampling rate remains at p 2 through periods of high- 
frequency fluctuations). Once the sampling rate has increased from p\ to 
P 2 , it stays there with high probability, provided the signal is sufficiently 
noisy. See Theorem 4.5. 

Property IV (Dual sampling rates can enhance recovery of high-frequency 
oscillations, with little adverse affect on estimation of low-frequency fea¬ 
tures). If sampling is undertaken at rate p \, then the estimator is able to 
recover (in the sense of consistent estimation) signals that have r contin¬ 
uous derivatives, and, indeed, can recover fluctuations that have frequen¬ 
cies of smaller order than p\. If sampling is at rate p 2 , then frequencies of 
smaller order than p 2 can be recovered. The dual-rate estimator is able to 
consistently estimate high-frequency parts of the signal that would not be 
accessible using a constant-rate estimator with the same long-run average 
sampling cost. This can be done without degrading, in first-order asymptotic 
terms, the accuracy of approximation in time intervals where the signal is 
of relatively low frequency. See Theorem 4.7 and the discussion at the end 
of this section. 

Our asymptotic arguments, based on high sampling rates, are justified 
by the high rates and low noise levels which are commonly encountered in 
practice; see Section 2.4 for discussion. We shall state our main results, and 
particularly the regularity conditions, in such a way that the proofs do not 
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require analysis at the level of a martingale error process. Nevertheless, the 
results have direct application in the latter context, as we shall relate. 

Theorems 4.1-4.3 address performance of the wavelet estimator when 
sampling is at a constant rate and the true signal is smooth. Our aim is to 
indicate the appropriate sizes for p and 5 in this setting, thereby motivating 
choices in the variable-sampling rate case when g is not so smooth. Therefore, 
for the present we take g = go, where 

, , go is an r-times continuously differentiable 

^ ' ' function defined on the whole real line. 


Suppose too that 
(4.2) 


the errors £j, in (2.1), are identically distributed with 
E\si\ e+B < oo for some B > 0, zero mean and variance cr 2 . 


Let p denote either p\ or p 2 . For reasons that will become clear in Theo¬ 
rem 4.2, the appropriate size of p for smooth signals is p 1 /( 2r+1 ), for large p. 
To determine the correct size of q, observe that we cannot resolve frequencies 
as large as p if we are only sampling at rate p. Therefore, we shall select q 
so that p q is a little smaller than p; let it be of order p 1_c , where c > 0. This 
is equivalent to 2 q = 0{p _1 p l ~ c ). Note too that we use the same p when 
p = pi or p = p 2 • These considerations motivate the regularity condition 

p = p(pi) x pj^ 2,+1 ^ and q = q(p) —> oo so 
slowly that 2 q = 0(p~ 1 p 1 ~ c ) for some c £ (0,1). 


(4.3) 


(If a and b are positive functions of p , then the property oxi», as p —> oo, 
means that the ratio a/b is bounded away from zero and infinity along the 
sequence.) 

Finally, suppose that 


(j) and if are each bounded and supported on the compact 
(4.4) interval [a, b], if is of order r as defined in Section 2.3, 
f <j>= 1, and integer translates of (f are orthonormal. 

Our next theorem shows that for functions such as go, the thresholded 
terms only very rarely make a contribution to the estimator g. Recall that 
g(s) = g s+T °(s), where g l is given by (2.3). Let X denote a finite interval. 


Theorem 4.1. Suppose data are generated by the model at (2.1), with 
signal g = go and independent errors £,. Assume the sampling rate is con¬ 
stant at p = pi or p 2 , and that 6 in the definition at (2.3) is given by (2.4), 
with C > 2 1 / 2 . Suppose too that (4.1)-(4.4) hold, with B > C 2 /(l —c) in (4.2) 
and, in (4.3), (q, p) = ( qj,Pj ) in the two respective cases. Then (a) if the 
sampling rate is p\, the probability that a thresholded term enters nondegen- 
erately into the estimator g(t) for some t€l converges to zero as p\ —> oo; 
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and (b) if the sampling rate is p 2 > pi, the probability that a thresholded term 
corresponding to a resolution level pt greater than Cip^^ 27 ^ (f or an arbi¬ 
trary fixed Ci> 0) enters nondegenerately into g(t ) for some t Gl converges 
to zero as p 2 —► oo. 

Our proof of the theorem shows that, for p = p\ or p 2 , the respective 
probabilities equal 1 — 0{p 1 ~^ c “/ 2 1+^) for each rj > 0. This type of bound 
also applies to all the probabilities that are discussed in Theorems 4.4-4.6: 
each of the probabilities converges to 1 at rate p AtyC \ where p denotes the 
relevant sampling rate and A(C) can be made arbitrarily large by choosing 
the constant C, in (2.4), sufficiently large. 

Part (a) of Theorem 4.1 motivates us to consider in more detail the esti¬ 
mator obtained by sampling at the base rate p\. A result in this case was 
given by Hall and Patil (1995); the following version is better adapted to 
the present context. 


Theorem 4.2. Assume data are generated by the model at (2.1), with 
independent errors £i and time points T\ equally spaced p\ units apart. Sup¬ 
pose too that, for (q,p) = (qi,pi), (4.1)-(4.4) hold, and that 5 in the defini¬ 
tion at (2.3) is given by (2.4) with C > 2 1 / 2 . Then, for all finite intervals T, 


(4.5) 


j^ig-gof 


= p^ l p\I\ +p- 2 V(l - 2 ~ 2r ) J^gV) 2 + 0 {pi 1 p + p- 2r ) 


as pi —> oo. 


It is immediately clear from (4.5) that for a constant sampling rate pi, 
and a smooth signal go that is not a polynomial of degree r — 1 , the asyrnp- 
totically optimal value of p will be a constant multiple of pfi . This 
motivates the first part of (4.3), and suggests that we should take p to be of 
this size in the variable-sampling rate case too, provided the signal is smooth 
“most” of the time. 

The L 2 convergence rate, when p = p\ and p x pV( 2r +i), is, as implied 
by Theorem 4.2, 0{p l '^ 2,+1 ^). Based on experience for more conventional 
estimators, we expect the Too convergence rate to differ from this by no 
more than a factor (logpi) 1 / 2 . Theorem 4.3 confirms this. The reason for 
our interest in rates is that, for more complex signals, we shall use 
consistency in the supremum metric to assess performance of the estimator. 


Theorem 4.3. Assume the conditions of Theorem 4.2, and, in partic¬ 
ular, that the sampling rates are constant at p\. Then, 

sup \g(t) — g(t)\ = 0„{pp /(2r+1) (fogpi) 1/2 }. 
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Next we develop a model for high-frequency fluctuations. It will be asymp¬ 
totic in character, and depend on a parameter, v say, which we could inter¬ 
pret either as one of the rates p\ and p 2 , but perhaps more realistically as 
the long-term average sampling rate; see (4.12) for a definition of the latter. 
Our theory will involve v diverging to infinity. 

Our model for the signal will amount to a smooth function go, described 
at (4.1), to which we shall add (on an interval J') fluctuations at least one of 
which is of unboundedly large frequency. If the frequencies of the fluctuations 
are represented by aq, < 22 , ■ • ■, then, in order for at least one of them to lead 
to a rate change as suggested by rule (R.l) in Section 2 , we should assume 
that 

(4.6) for some k = k{y), a k / 7n —> 00 and ak = o(p qi ). 

The high-frequency fluctuations that we shall add to go will have the form 
7 {a(- — it)}, where a = a k and 


,. , 7 is a nondegenerate function, supported on the interval 

' [— 1 , 1 ] and having r continuous derivatives on the real line. 

Without loss of generality, 7 is centred so that 

(4.8) 7 M(o)/0. 

(Any shift in the location of 7 can be incorporated into the it’s.) For the sake 
of simplicity we shall choose 7 to be the same for each fluctuation, although 
our results are not changed if we use a more elaborate construction. The 
locations u and frequencies a will vary, however, as follows. 

Since the function go satisfies (4.1), then its first r derivatives are bounded 
in any compact interval. On the other hand, if 7 satisfies (4.7) and a = 
a(n) —> 00 , then the supremum of the absolute value of any one of the 
first r derivatives of 7 {a(- — it)} diverges to infinity in any open interval 
containing u. Therefore, 7 {«(■ — u)} can fairly be said to exhibit fluctuations 
whose size is an order of magnitude greater than in the case of go- We shall 
use the former function to model high-frequency wiggles which trigger an 
increase in the sampling rate, from p\ to p 2 - 

We shall add the fluctuations within an interval J’, the length of which 
could converge to zero as v —* 00 . Thus, there will be a “cluster of wiggles” 
7 ^ in J, described through a sequence of pairs ( u k ,a k ) with the following 
property: 


, the functions 7 *, = 7 {a k {- — u k )} are all supported in J , and 
no two of the support intervals [u k — a~^ l ^Uk + a]} 1 ] overlap. 

The signal that our wavelet estimator will endeavor to recover is 

(4.10) g = go + ^2i{ak{--u k )}. 

k 
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For the present we assume that the interval J is placed immediately to 
the right of I, so that (in view of Theorem 4.1) the probability that the 
algorithm enters J using sampling rate p\ converges to 1 as v —> oo. Our 
next result gives conditions under which, if the fluctuations in J are as 
described at (4.10), then (with high probability) a rate switch from p\ to p 2 
occurs during J. 

If the frequency oq of the first fluctuation satisfies (4.6) for k = 1, then, 
with probability converging to 1 as v —> oo, there will be a switch to rate p 2 in 
the close vicinity of time u\. This follows from Theorem 4.4, on considering 
the case where the series at (4.10) consists of the single fluctuation 7(07 (• — 
rti)}. In such a case, the theorem does not make any comment on what 
happens later in interval that will be dealt with in Theorems 4.5 and 4.6. 

Theorem 4.4. Suppose data are generated by the model at (2.1), with 
signal g given by (4.10). Assume the estimator g is constructed using C > 
2 1 / 2 in the threshold 5, that the rule (R.l) is used to define an upward 
rate switch, and that (4.1)-(4.4) hold, with B>C 2 /(l — c) in (4.2) and, in 
(4.3), (q,p) = (< 7 i,pi). Suppose too that (4.6)-(4.9) hold. If, on entering time 
interval J, the sampling rate is p\, then with probability converging to 1 as 
u —> oo, an increase in the rate to rate p 2 will occur during time interval J. 

We continue to assume the signal is composed of fluctuations that may 
be modelled as at (4.10). However, when showing that the rate will not 
change during the time interval J, we make the additional assumption that 
during each subinterval of J, of length to, there exists a fluctuation whose 
frequency is of larger order than 1 T 2 and of smaller order than p q2 : 

it is possible to choose a subset A of the set of all frequencies 
represented at (4.10), such that, for each time interval 1C of 
(4.11) length To included within J, there is at least one a.k £ A such 
that the associated function 7 {afc(- — Uk )} is supported within 
1C, and, moreover, ^2 = o(min Q , e _ 4 a) and max tt6 ^« = o(p q2 ). 

In the result below, we assume that we start the time interval J using 
sampling at rate P 2 ■ Thus, J can no longer be thought of as following im¬ 
mediately after an interval where the signal is smooth. However, it could 
follow immediately after a short interval that contained a single fluctuation 
a = a\ which triggered a switch from rate p± to P 2 ', see the paragraph im¬ 
mediately preceding Theorem 4.4. Recall that rule (R.2), for switching to a 
lower sampling rate, was given in Section 2. 

Theorem 4.5. Assume that the estimator g is constructed using C > 
2 1 / 2 in the threshold 5, that the rule (R.2) is used to define a downward rate 
switch, and that (4.1)-(4.4) hold, with B > C 2 /(l — c) in (4.2) and, in (4.3), 
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(q,p) = (Q 2 ,P 2 )- Suppose too that \J\ is bounded as v —> oo, that (4.7)-(4.9) 
and (4.11) hold, that g is given by (4.10), and that the sampling rate at the 
start of J equals p 2 - Then, with probability converging to 1 as z/—> oo, the 
sampling rate stays at p 2 throughout J. 

This result has an analogue in which the frequencies in J are relatively 
low, and a switch from sampling rate p 2 to p\ is virtually assured: 

Theorem 4.6. Assume the conditions in Theorem 4.5, except that the 
constraints u tt 2 = o(riiin ae _4 a) and max a eA a = °(p<? 2 )” a t the end of (4.11) 
are changed to “max aG „4 a. = o{min(7Ti,7^)}.” Then, with probability con¬ 
verging to 1 as z/ —> 00 , the sampling rate switches from p 2 to p\ during J, 
and stays there for the duration of that time interval. 

Finally we show that, when sampling is carried out at rate p , the estimator 
is able to consistently recover frequencies almost up to the level p. 

Theorem 4.7. Suppose data are generated by the model at (2.1), with 
independent errors . Assume the sampling rate is constant at p = p\ or p 2 , 
and that the threshold 5 is given by (2.4) with C > 2 1 / 2 . Suppose too that 
(4.2)-(4.4) hold, with B > C 2 /(l — c) in (4.2) and, in (4.3), (q, p) = (q\, p\) 
or ( q 2 ,P 2 ) for the respective sampling rates. Assume the signal is given by 
(4.10) on J , where maxa*. = o(p q ). Then, for each rj > 0, the probability 
that \g — g\ < 77 uniformly on J converges to 1 as u —► 00 . 

We conclude by quantifying some of the potential gains and losses from 
dual-rate sampling. Suppose the expense of sampling, expressed, for exam¬ 
ple, in terms of the capacity of the data storage device, demands that the 
long-run sampling rate not exceed v per unit time. If, in parts of the signal 
that have relatively high frequency, we use rate p 2 > v rather than v, then 
(in order to stay within budget) at other time points we should reduce the 
rate to pi, where p\ and p 2 are connected by the formula 

(4.12) z/ = pi(l-n) + p 2 n, 

and II denotes the long-run proportion of time for which we use rate p 2 - 

It may be deduced from Theorem 4.2 that the condition for there to be no 
asymptotic deterioration in mean-squared error, to first order, in the rela¬ 
tively smooth places where rate p\ is employed, is v ~ p\. This is, of course, 
equivalent to IIp 2 —> 0 as v —> 00 . In the proportion II of the time when 
we use the higher sampling rate, there is (in view of Theorem 4.7) poten¬ 
tial for consistently estimating the signal where this would not otherwise be 
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possible. 
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